Cpu memory graph break #3886

cehongwang · 2025-11-04T20:05:10Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_compiler.py	2025-11-04 20:05:23.825034+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_compiler.py	2025-11-04 20:05:55.253944+00:00
@@ -876,15 +876,14 @@
    # This is done to release CPU memory.
    for attr in dir(gm):
        if attr.startswith("_frozen_param"):
            delattr(gm, attr)

-
-
    from torch_tensorrt.dynamo.conversion._ConverterRegistry import DYNAMO_CONVERTERS
+
    DYNAMO_CONVERTERS.disallowed_targets = set()
-    
+
    for name, _ in partitioned_module.named_children():
        submodule = getattr(partitioned_module, name)
        # filter on the GraphModule
        if not isinstance(submodule, torch.fx.graph_module.GraphModule):
            continue

narendasan

Do you have a test case or something to demonstrate this feature?

narendasan · 2025-11-04T20:10:40Z

py/torch_tensorrt/dynamo/partitioning/_adjacency_partitioner.py


 logger = logging.getLogger(__name__)
+NON_BREAKABLE_OP_LISTS = [
+    ["addmm", "addmm"],


Just a note for implementation later.

this should use an actual subgraph definition

it should use pytorch op targets not strings

addmm should be decomposed right so the graph we want is mm -> add

There should be a user facing API to modify this list similar to what we have for passes

narendasan · 2025-11-04T20:11:15Z

py/torch_tensorrt/dynamo/partitioning/_adjacency_partitioner.py

-
-    def calculate_num_of_break(self, subgraphs: List[Subgraph]) -> int:
+
+    def calculate_size_budget(


Should there be an API to define this manually?

Yeah I think so. For now you can just hardcode and play with it

py/torch_tensorrt/dynamo/partitioning/_adjacency_partitioner.py

narendasan · 2025-11-06T20:02:59Z

We should think about using this tech for refit vs non refit
Make refit apis work across graph breaks

narendasan · 2025-11-06T20:04:57Z

Improve usability by automating nn.Module -> atomic fx graph

py/torch_tensorrt/dynamo/partitioning/fusion_patterns.py

narendasan · 2025-11-07T21:18:59Z

py/torch_tensorrt/dynamo/partitioning/fusion_patterns.py

+        return x
+
+
+All_FUSION_PATTERNS = [


We could cache the graphs if we do symbolic trace on register

Do you mean to trace the graphs when the program starts every time? Do you think that would cause unnecessary latency when cpu memory is enough. I am thinking maybe we could use LRU cache or something so it will only be called once and it's lazy initialization

narendasan · 2025-11-07T21:19:54Z

py/torch_tensorrt/dynamo/_defaults.py

 L2_LIMIT_FOR_TILING = -1
 USE_DISTRIBUTED_MODE_TRACE = False
 OFFLOAD_MODULE_TO_CPU = False
+CPU_MEMORY_BUDGET = -1


Use an optional instead since this is not a TRT api we dont need -1 to mean let us decide

narendasan · 2025-11-07T21:20:17Z

py/torch_tensorrt/dynamo/utils.py

+    return psutil.Process().memory_info().rss / 1024 / 1024
+
+
+def release_memory() -> None:


Did this get moved?

narendasan · 2025-11-07T21:20:58Z

tests/py/dynamo/models/test_models.py

    torch._dynamo.reset()


+def compile_one(idx: int, ir: str):


Why is this test here?

narendasan · 2025-11-07T21:23:11Z

py/torch_tensorrt/dynamo/partitioning/_adjacency_partitioner.py

+
+    def size_of_subgraphs(self, subgraphs: List[Subgraph]) -> List[int]:
+        """
+        This function calculates the size of the subgraph.


Can you describe the algorithms here so we have reference for later?

meta-cla bot added the cla signed label Nov 4, 2025

github-actions bot requested a review from peri044 November 4, 2025 20:05

github-actions bot requested changes Nov 4, 2025

View reviewed changes

narendasan reviewed Nov 4, 2025

View reviewed changes

cehongwang force-pushed the cpu-memory-graph-break branch from 7f0e504 to 18ccadf Compare November 5, 2025 22:03

cehongwang force-pushed the cpu-memory-graph-break branch from 18ccadf to f03ab2c Compare November 6, 2025 20:06